Reinforcement Learning Using a Stochastic Gradient Method with Memory-Based Learning

نویسندگان

  • TAKAFUMI YAMADA
  • SATOSHI YAMAGUCHI
چکیده

In this paper, a learning algorithm combining memory-less learning and memory-based learning is proposed for agents operating under POMDP. In the first stage of the proposed algorithm, memory-less learning is applied. The stochastic gradient method is employed as a memory-less learning algorithm. In the first stage, a state-action set series that accomplishes the task is stored in memory. In the second stage, memory-based learning is applied. In this process, only the series obtained in the first stage is used, so that this method is able to reduce significantly the amount of required memory. The proposed algorithm is applied to three simulations for comparison with the memory-less learning algorithm. Through computer simulations, it is shown that the proposed algorithm works more effectively in POMDP than ordinary memory-less learning. © 2010 Wiley Periodicals, Inc. Electr Eng Jpn, 173(1): 32–40, 2010; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/eej.20963

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Stochastic Composition Optimization

Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic firstorder method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for t...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Recurrent policy gradients

Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural ...

متن کامل

A Novel Fuzzy Based Method for Heart Rate Variability Prediction

Abstract In this paper, a novel technique based on fuzzy method is presented for chaotic nonlinear time series prediction. Fuzzy approach with the gradient learning algorithm and methods constitutes the main components of this method. This learning process in this method is similar to conventional gradient descent learning process, except that the input patterns and parameters are stored in mem...

متن کامل

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010